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Abstract — Spectrum sensing is the major cliallenge in the 
cognitive radio (CR). We propose to learn local feature and use 
it as the prior knowledge to improve the detection performance. 
We define the local feature as the leading eigenvector derived 
from the received signal samples. A feature learning algorithm 
(FLA) is proposed to learn the feature blindly. Then, with local 
feature as the prior knowledge, we propose the feature template 
matching algorithm (FTM) for spectrum sensing. We use the 
discrete Karhunen-Loeve transform (DKLT) to show that such a 
feature is robust against noise and has maximum effective signal- 
to-noise ratio (SNR). Captured real-world data shows that the 
learned feature is very stable over time. It is almost unchanged 
in 25 seconds. Then, we test the detection performance of the 
FTM in very low SNR. Simulation results show that the FTM is 
about 2 dB better than the blind algorithms, and the FTM does 
not have the noise uncertainty problem. 

I. Introduction 

Radio frequency is fully allocated for the primary users 
(PU), but with low utilization rate (H] as low as 15% lID, 
lis). The concept of cognitive radio (CR) was proposed so that 
the secondary users (SU) can occupy the unused spectrum 
from the PU, therefore improving the spectrum utilization rate. 
The CR requires the SU to detect the existence of PU in 
a short time in very low signal-to-noise ratio (SNR), which 
is spectrum sensing. IEEE 802.22 is the first IEEE working 
group embedding the CR technology [4| and has triggered lots 
of research. 

Many spectrum sensing algorithms have been proposed. 
Take the DTV signal sensing for example. Specific features 
are often defined from the spectral information, such as pilot 
tone (jS], spectrum shape |6| and cyclostationarity Q, etc.. 
Generally speaking, they are robust and have good perfor- 
mance when it is assumed that those features are universal 
to all the SUs. However, such assumption is not true in 
real-life. Fig. [T] shows the DTV spectrum measured |8| at 
different locations. It can be seen that the spectral features 
are location dependent, due to different channel characteristics 
and synchronization mis-match, etc. Therefore, we cannot rely 
on the pre-determined prior knowledge of signals for spectrum 
sensing. 

Energy based algorithms do not have such problem. Essen- 
tially, energy based algorithms require the prior knowledge 
of noise. However, the noise uncertainty problem |9| will 
limit the performance of energy based algorithms. Pure blind 
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Fig. 1. Spectmm measured at different locations in Wasliington D.C.. 
Left: 'Single Family Home'; Right: 'Apartment (High-Rise)'. The pilot tones 
are located at different frequency locations. Two spectrum suffer different 
frequency selective fading. 



algorithms have been proposed in ifTOl . ifTTl . such as the 
maximum eigenvalue to minimum eigenvalue ratio (MME). 
No noise information is required and the noise uncertainty 
problem is successfully avoided. 

In this paper, we propose to use learned prior knowledge to 
improve detection performance. The learned prior knowledge 
is the leading eigenvector derived from the received signal's 
sample covariance matrix, using the discrete Karhunen-Loeve 
transform (DKLT). Similar to the terminology of the pattern 
recognition in machine learning, we define the leading eigen- 
vector as signal feature. We first propose a feature learning 
algorithm (FLA) to acquire the local feature blindly. Then, we 
propose a feature template matching algorithm (FTM) which 
uses the learned feature for spectrum sensing. The leading 
eigenvector, a.k.a., feature, is optimum in signal representation 
lfT2l and most reliable when the distribution of signal is 
unknown fT3\. In analogy with the recognition of pattern 
features in image and speech, etc., spectrum sensing is the 
recognition of the PU feature at the receiver We will show 
that: 

Feature is a robust approximation of the non-white 
wide-sense stationary (WSS) signal against the white 
Gaussian noise (WGN). 
Feature has maximum effective SNR. 
The proposed algorithms are immune to the noise un- 
certainty problem. 

We use both simulated data and real-world data to demonstrate 
that feature is robust and stable against noise and feature can 
be learned blindly even in very low SNR. DTV samples [81 are 
used to compare the detection performance of the FTM and the 
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MME. The simulation results show that to achieve the same 
detection performance, the minimum required SNR for the 
FTM is about 2 dB lower than that of the MME, which shows 
that with the feature as the prior knowledge, the detection 
performance can be improved. 

The paper is organized as follows. The FLA and the FTM 
are presented in Section II. The theoretical background of 
the feature is introduced in Section III. Simulation results are 
shown in Section IV and conclusions are made in Section V. 

II. Problem Formulation and the Proposed 
Algorithms 

The spectrum sensing problem can be modeled as follows. 
X {t) = s (i) + n {t) represents the received signal at the SU, 
with s (t) the PU signal and n (t) the WGN. Both s {t) and 
n (t) are independent random process with zero mean, but s (t) 
is non-white WSS while n (t) is white Gaussian. Assume the 
frequency bandwidth being sensed is B centered at frequency 
fc- After Nyquist sampling with period Tg < l/B, we can 
represent the received signal x (t) in discrete form: x [n] — 
X (nTs), s[n] = s [nTs) and w [n] — w [nTs). The spectrum 
sensing problem has two hypotheses: Hq, signal does not exist; 
and Hi, signal exists. The received discrete form signal under 
the two hypotheses is therefore as follows: 

Ho : x[n] = w [n] (1) 

Hi : a; [n] = s [n] + w [n] (2) 

Two probabilities are of interest. The detection probabiUty, 

Pd iVLi\x [n] — s[n]+ w [n]), and the false alarm probability, 
Pf (Hi I a; [n] = w[n]). 

It is assumed that the learning and sensing processes are 
performed within the channel coherent time, and there is a 
pre-whitening filter before any processing. 

Let x„, s„ and w„ be random vectors consisting of N 
samples of x[n], s [n] and w [n], respectively: 

x„ — [x [n] , X [n + 1] , ■ ■ ■ ,x [n + N — 1]]^ (3) 
Sr, = [s[n],s[n + 1],- ■■ ,s[n + N - l]f (4) 

Wn^[w[n],w[n+l],---,w[n + N - l]f (5) 

where (•)^ denotes matrix transpose. Using E [■] as the 
notation for expectation, we have corresponding covariance 
matrices: 

K^ = E [x„x^] (6) 

Ils=E [s„s^] (7) 

R^ = E [w„w^] (8) 
Since s [n] and w [n] are independent, we have: 



— R-s + R-uj (9) 

Since w [n] is WGN, we have: 

= allN (10) 

where cr,^ is noise variance and Iat is x iV identity matrix. 

If we do the eigen-decomposition on the covariance matrix 
Rx, we can get a set of eigenvalues {Ai, A2, • • • , Aat} and 
eigenvectors {(pi, (p2, ■ ■ ■ , (f>N}, satisfying: 

Ai>A2>--->A^ (11) 

and 

Rx0» = A,0, i^l,2,---,N (12) 

In the terminology of pattern recognition, {(pi} are called 
features. The process of calculating features is called feature 
extraction. Since our algorithms only deal with the leading 
eigenvector, in this paper only (pi is named as feature for 
brevity. 

The exact covariance matrix cannot be derived in 
practice because we do not know the exact expectation 
of all the random processes. Alternatively, if we define 

as the m-th sensing segment, we can have an approximated 
sample covariance matrix 'R.x,m by averaging: 

Rx,m = — ^^^^ 

i=l + N^x{rn-l) 

Now we use ipm to represent the leading eigenvector of 
covariance matrix lix.rn of segment m, a.k.a., the feature of 
segment m. We use the intuitive template matching to find the 
similarity of features between segments i and j: 

N 

^''^ 1=1 2™^W-fe+l E + ^] I ^^^^ 

'' " fc=l 

Based on the above notations and concept of feature, we 
propose the FLA and the FTM: 
a) Algorithm 1, the FLA: 

1) Collect two consecutive sensing segments Xi, Xi+i' with 
Ng + N — 1 samples each. 

2) Compute covariance matrices for each segment accord- 
ing to (O and ( HjI ). 

3) Extract features and ipi+i for the corresponding 
segments. 

4) Compute similarity Pi,i+i between these two features 
using (fT4l i. 

5) If Pi.i+i > Te, then feature is learned as f feature = 

where Tg is the threshold determined according by the simi- 
larity of consecutive noise segments. 

Using the learned feature (p feature as the prior knowledge, 
we propose the FTM: 



b) Algorithm 2, the FTM: 

1) Collect TVs + iV — 1 consecutive samples. 

2) Compute covariance matrix for this segment according 
to © and O- 

3) Extract feature (^current of the current segment. 

4) Compute similarity Pf eature. cur rent- 

5) If Pf eature, current > Tf, then hypothcsis Hi is claimed. 
Otherwise, hypothesis Hq is claimed. 

Both Te and Tf are thresholds to be set according to noise 
statistics. As will be shown later, and Tf are independent 
to the noise energy, or the SNR. There is no noise uncertainty 
problems in setting T^ or Tf. 

So far we have proposed the FLA and FTM. In the next 
section we will show why we use leading eigenvector as 
feature. 

III. Theoretical Background 

The theoretical background of the FLA and FTM lies in 
DKLT. It explains why we define feature as leading eigenvec- 
tor We follow the description of fT?] to get a brief review 
of DKLT. If we consider a zero mean random sequence 
{x [n] ; n = 1, • • • , N}, this sequence can be expanded in any 
set of orthonormal basis functions [n] as: 

X [n] ^ Ki/i [n] + K2/2 [n] H h kn/n [n] (15) 

where the Ki are coefficients in the expansion and "orthonor- 
mal" means that the functions satisfy the relation 

N f . ^ . 

E/*N/,N= Q I A (16) 

n=l ^ 

where * denotes conjugate. Following (fTsT i and (fTSI l. the 
coefficients are given by: 

N 

= I] /; w X w (17) 

It is desired to find a particular orthonormal set of functions 
such that: 

= ,7/ ^^^^ 

Therefore, the coefficients are uncorrelated. Define random 
vector X — [x [Ij , x [2] , • • • , x [-/V]]"^, coefficient vector k = 
K2, • • • , Kjv] , and the matrix 

* = [0i,02,---,</'jv]; (19) 

where 

(t>^ = [h [1] , /. [2] , • • • , /. [N]f (20) 

From ( fTSI l, $ is a unitary matrix such that ^^*^ = In- 
(flSl l and (flSl l can now be expressed in matrix formulation as: 

X = ^K (21) 



and 

K = **^x (22) 

Equation (I2TI 1 and (l22T l have the following interpretation. 
If we consider the sequence x [n] as a vector x in an A^- 
dimensional space, then Ki can be regarded as components of 
the same vector with respect to a rotated coordinate system. 
If we choose as the eigenvectors of the covariance matrix: 

Tl^ = E [xx*^] (23) 

Then, the resulting Ki satisfy (fTsT l. Therefore, the desired set 
of basis functions in ( fTsT i are determined by the eigenvectors 
of the covariance matrix Rx: 

Rx0» = k<P^ (24) 

Thus in the new coordinate system, eigenvectors 0^ deter- 
mine the directions, while eigenvalues Xi determine the signal 
energy in the corresponding directions. 

The transformation in ([TtT i with such basis functions is the 
DKLT, and ( ITSb is called the Karhunen-Loeve expansion for 
the random process. The DKLT is the only transformation that 
results in ( fTsT l. 

A. Properties of Feature and DKLT 

DKLT has many useful properties. They have been suc- 
cessfully used in principle component analysis (PCA) ifTSl . 
singular spectrum analysis (SSA) [TS] and pattern recognition 
|[T7|, etc.. We list two of the properties for spectrum sensing. 

Property 1 : 

Equation (fTsl l is the optimal linear approximation represen- 
tation of the random process if the expansion is truncated to 
use AI < N orthonormal basis functions: 

M 

x[n]=Y.K,f,[n]; M<N (25) 

i=l 

Property 2: 

The leading eigenvector <j)i is determined by the direction 
with largest signal energy. For any Ai > cr^, will remain 
almost the same. 

Property 2 has a geometric explanation with a two dimen- 
sional case in Fig. |2] Assume we have 2x1 random vectors 
Xs+n = Xs +Xn, whcrc Xg is vectorized sine sequence and x„ 
is the vectorized WGN sequence. SNR is set to dB. There 
are 1000 samples for each random vectors in Fig. |2] Now 
we use DKLT to set the new X axes for each random vector 
samples such that Ai is strongest along the corresponding new 
X axes. It can be seen that new X axes for Xs (SNR = 00 dB) 
and Xs+n (SNR = dB) are almost the same. X axes for Xn 
(SNR = —00), however, is rotated with some random angle. 
This is because WGN has almost same energy distributed in 
every direction. New X axes for noise will be random and 
unpredictable but the direction for signal is very robust, as 
long as Ai > cr^ 




Fig. 2. Illustration of Property 2. 



Property 2 helps us to conclude that among all eigenvectors, 
only leading eigenvector is most robust against noise. Together 
with Property 1, we can prove that the leading eigenvector is 
also optimal approximation of original signal by simply setting 
M to 1 in dZSl l. Moreover, since signal energy/noise energy 
estimation is not used in the entire process ifTSI . there is no 
noise uncertainty problem for feature learning. 

In another view, the effective SNR on the leading eigen- 
vector is higher than the original SNR. If we do DKLT on 
s [n] and w [n], we have eigenvalues As,i for s [n] and X^^i 
for w [n]; eigenvectors (l)s,i for s [n] and (p^^i for w [n]. The 
SNR for X [n] is: 



JV 



N 



SNR, = ^-.^/ 



(26) 



Suppose we only use the leading eigenvector 0i to approx- 
imate X [n] by X [n] in ( |25T l. Since w [n] is white, Au,.i = cr^ 
for all i, the SNR for x [n] is therefore 



(27) 



The SNR gain after using the leading eigenvector of DKLT 



is: 



GsNR 



SNRa _ NXs,i 
SNR, 



N 

E As 



(28) 



Since As_i > As, 2 > • • • As, at, Gsnr > 1- Such SNR gain 
is optimal |fT9l . 

This is the foundation of the FLA and FTM. They use the 
fact that consecutive features of WSS signal are similar, while 
consecutive features of noise are random. 



B. Implementation Issue 

The major processing part of our algorithms lies in the 
feature extraction. If we analyze the processing delay of 
the feature extraction, it can be divided to two steps. First 
is to compute the covariance matrix tix,m in and the 
other is the eigenvector calculation. Since the computation of 
R-a;,m Can be done in real-time, the major processing delay 
lies in the eigenvector calculation. In the feature extraction, 
we only want to know the leading eigenvector and we do 
not need to do the complete eigen-decomposition, which has 
computation complexity of 0(A^'^). Recently, a fast PCA 
algorithm for fixed point implementation have been proposed 
with computation complexity O (-/V^) EOl^ minimizing the 
processing delay. Being able to compute ( fTsT l in real-time is a 
huge advantage if compared with spectral methods using the 
fast Fourier transform (FFT). Since FFT can only be calculated 
when all Ng samples are captured, the computation complexity 
is O (iVs log (A^s))- Because usually Ng » N, the feature 
extraction has much less delay than FFT. Currently we have 
implemented the algorithms in field programmable gate array 
(FPGA) and digital signal processor (DSP) 1.21 J. L22J. 

IV. Simulation Results 

In this section, we first use show that the signal feature 
can be extracted under unknown low SNR. Then, real-world 
captured data is used to show that the signal feature can be 
learned blindly and is stable over time. Finally, we compare 
the detection performance of the FTM with the MME in very 
low SNR, using the same real-world data. In the comparison, 
same covariance matrix is used, but the FTM has the learned 
feature as prior knowledge. 

A. Feature Robustness Test Against Noise 

Here we give an example to show feature extraction under 
unknown low SNR. s [n] has constant power spectrum density 
in the frequency band from 0.5 MHz to 1.5 MHz. x [n] is 
the noisy signal with unknown amount of noise. As shown 
in Fig. [3] no spectral information of x [n] can be extracted in 
the frequency domain. We use the FLA to extract the features 
from the noisy x [n] and the noise free s [n] . As can be seen in 
Fig. m those two features are very similar, with similarity as 
high as 94%. As a result, feature is very robust against noise 
and there is no noise uncertainty in feature extraction. 

B. Feature Learning Test with Real-world Data 

Then we use real world data to demonstrate that the signal 
feature is very stable over time while the noise feature is 
random. Field measurements of DTV done in Washington D.C. 
[:8 | are used as the PU signal. Simulated WGN samples are 
used. The captured signal has a duration of about 25 seconds. 
All synchronization information of the DTV signal is blind to 
the SU receiver Receiver SNR and the communication chan- 
nel between the transmitter and receiver are also unknown. 
However, we do know that the received SNR is changing at 
the receiver and the channel has slow fading. We use the FLA 
to calculate the similarities of consecutive features for both 
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Fig. 3. Spectram of x [n] and s [n]. 
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Fig. 4. Features of x [n] and s [n]. 



signal and noise in 25 seconds, respectively. We set Ns = 10^ 
and N = 64. The corresponding duration of each segment is 
approximately 4.6 ms. Fig. |5] shows Similarity of Consecutive 
Features VS Time plot. By setting Te = 90%, Pi,i+i > for 
99.46% amount of time when the PU signal exists. Moreover, 
the similarity between the features of the first sensing segment 
and the last sensing segment is as high as 99.98%, showing 
that the signal feature is very stable and almost unchanged in 
25 seconds. When the PU signal does not exist, pi^i+i > for 
only 0.92% amount of time. Sohpisticated learning algorithms 
will be developed in the feature learning process to obtain the 
signal feature in a robust and fast manner 

C. ROC Curves for the FTM and the MME 

We use one segment of the previous DTV data samples as 
the clean received PU signal s [n] and add noise with variance 
(7^ to emulate w [n] . In the simulation, the signal feature is 
the prior knowledge. Sensing time is set to approximately 4.6 



Fig. 5. Similarity of consecutive features of PU signal and noise in 25 
seconds. 



ms with Ns = 10^ and N = 64. WON is added according 
to different SNR levels. We compare the results of both 
algorithms. The MME uses the covariance matrix's max-min 
eigenvalue ratio, Amax/Amin for detection lITOl . and uses no 
prior knowledge. Ng and N are set the same for these two 
algorithms. In the simulation, we perform both algorithms on 
the same signal and noise and repeat the simulation for 1000 
times. Fig. HI shows the VS SNR, with Pf = 10%. It 
can be seen that to reach Pd ~ 100%, the minimum required 
SNR for the FTM is about 2 dB lower than that of the MME. 
Note that in our simulations, all T/ set by the FTM to get 
Pf = 10% are very stable for different SNR. This is because 
Tf is independent of SNR, signal energy or noise energy, 
and the FTM does not have noise uncertainty problem. Fig. Q 
shows the receiver operating characteristic (ROC) curves when 
SNR = --22 dB. At Pf = 10%, FTM has Pa = 80%, while 
MME only has Pd ~ 42%. This shows the advantage of using 
the prior knowledge. 

V. Conclusions 

Signal feature is location dependent. We propose to learn 
signal feature blindly and use it for spectrum sensing. We 
define the signal feature as the leading eigenvector of signal's 
sample covariance matrix, because it is a robust approximation 
of original signal against noise and optimum in effective SNR, 
based on DKLT properties. We propose the FLA for blind 
feature learning and the FTM for spectrum sensing with the 
signal feature as the prior knowledge. Since our algorithms 
do not depend on SNR or the noise energy, noise uncertainty 
problem is successfully avoided. We use simulated data and 
real-world data to demonstrate feature's robustness against 
noise and its stability over time. Detection performance of the 
FTM in low SNR is compared with MME, which is totally 
blind. Simulation results show that to achieve Pd « 100% and 
Pf = 10%, the minimum requried SNR for the FTM is about 




Fig. 6. Pd VS SNR for FTM and MME. Ns = 10'^, N 
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2 dB lower than that of the MME. 

This is only the beginning of our work. Further research top- 
ics include implementation, sophisticated feature learning and 
quantizing the thresholds, etc. In addition, feature extracted 
by DKLT is optimum only in the context of linear transforms. 
When signal has non-linear structures, non-linear methods like 
Kemel-PCA 1231 and manifold-learning |24| can be the next 
powerful tools to be explored. 
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